Mining GO Annotations for Improving Annotation Consistency

نویسندگان

  • Daniel Faria
  • Andreas Schlicker
  • Catia Pesquita
  • Hugo Bastos
  • António E. N. Ferreira
  • Mario Albrecht
  • André O. Falcão
چکیده

Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A procedure for assessing GO annotation consistency

MOTIVATION The Gene Ontology (GO) is widely used to annotate molecular attributes of genes and gene products. Multiple groups undertaking functional annotations of genomes contribute their annotation sets to the GO database resource and these data are subsequently used in comparative functional analysis research. Although GO curators adhere to the same protocols and standards while assigning GO...

متن کامل

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...

متن کامل

Iranian EFL Learners L2 Reading Comprehension: The Effect of Online Annotations via Interactive White Boards

This study explores the effect of online annotations via Interactive White Boards (IWBs) on reading comprehension of Iranian EFL learners. To this aim, 60 students from a language institute were selected as homogeneous based on their performance on Oxford Placement Test (2014).Then, they were randomly assigned to 3 experimental groups of 20, and subsequently exposed to the research treatment af...

متن کامل

GOAnnotator: linking protein GO annotations to evidence text

BACKGROUND Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but ...

متن کامل

The Gene Ontology Annotation database (GOA): A community resource of GO annotations to the UniProt Knowledgebase

The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) provides highquality electronic and manual annotations to the UniProt Knowledgebase using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting Un...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012